5 research outputs found
Off-Policy Actor-Critic with Emphatic Weightings
A variety of theoretically-sound policy gradient algorithms exist for the
on-policy setting due to the policy gradient theorem, which provides a
simplified form for the gradient. The off-policy setting, however, has been
less clear due to the existence of multiple objectives and the lack of an
explicit off-policy policy gradient theorem. In this work, we unify these
objectives into one off-policy objective, and provide a policy gradient theorem
for this unified objective. The derivation involves emphatic weightings and
interest functions. We show multiple strategies to approximate the gradients,
in an algorithm called Actor Critic with Emphatic weightings (ACE). We prove in
a counterexample that previous (semi-gradient) off-policy actor-critic
methods--particularly Off-Policy Actor-Critic (OffPAC) and Deterministic Policy
Gradient (DPG)--converge to the wrong solution whereas ACE finds the optimal
solution. We also highlight why these semi-gradient approaches can still
perform well in practice, suggesting strategies for variance reduction in ACE.
We empirically study several variants of ACE on two classic control
environments and an image-based environment designed to illustrate the
tradeoffs made by each gradient approximation. We find that by approximating
the emphatic weightings directly, ACE performs as well as or better than OffPAC
in all settings tested.Comment: 63 page
The Utility of Sparse Representations for Control in Reinforcement Learning
We investigate sparse representations for control in reinforcement learning.
While these representations are widely used in computer vision, their
prevalence in reinforcement learning is limited to sparse coding where
extracting representations for new data can be computationally intensive. Here,
we begin by demonstrating that learning a control policy incrementally with a
representation from a standard neural network fails in classic control domains,
whereas learning with a representation obtained from a neural network that has
sparsity properties enforced is effective. We provide evidence that the reason
for this is that the sparse representation provides locality, and so avoids
catastrophic interference, and particularly keeps consistent, stable values for
bootstrapping. We then discuss how to learn such sparse representations. We
explore the idea of Distributional Regularizers, where the activation of hidden
nodes is encouraged to match a particular distribution that results in sparse
activation across time. We identify a simple but effective way to obtain sparse
representations, not afforded by previously proposed strategies, making it more
practical for further investigation into sparse representations for
reinforcement learning.Comment: Association for the Advancement of Artificial Intelligence 201
From eye-blinks to state construction : diagnostic benchmarks for online representation learning
We present three new diagnostic prediction problems inspired by classical-conditioning experiments to facilitate research in online prediction learning. Experiments in classical conditioning show that animals such as rabbits, pigeons, and dogs can make long temporal associations that enable multi-step prediction. To replicate this remarkable ability, an agent must construct an internal state representation that summarizes its interaction history. Recurrent neural networks can automatically construct state and learn temporal associations. However, the current training methods are prohibitively expensive for online prediction—continual learning on every time step—which is the focus of this paper. Our proposed problems test the learning capabilities that animals readily exhibit and highlight the limitations of the current recurrent learning methods. While the proposed problems are nontrivial, they are still amenable to extensive testing and analysis in the small-compute regime, thereby enabling researchers to study issues in isolation, ultimately accelerating progress towards scalable online representation learning method
Investigating the Properties of Neural Network Representations in Reinforcement Learning
In this paper we investigate the properties of representations learned by
deep reinforcement learning systems. Much of the early work on representations
for reinforcement learning focused on designing fixed-basis architectures to
achieve properties thought to be desirable, such as orthogonality and sparsity.
In contrast, the idea behind deep reinforcement learning methods is that the
agent designer should not encode representational properties, but rather that
the data stream should determine the properties of the representation -- good
representations emerge under appropriate training schemes. In this paper we
bring these two perspectives together, empirically investigating the properties
of representations that support transfer in reinforcement learning. We
introduce and measure six representational properties over more than 25
thousand agent-task settings. We consider Deep Q-learning agents with different
auxiliary losses in a pixel-based navigation environment, with source and
transfer tasks corresponding to different goal locations. We develop a method
to better understand why some representations work better for transfer, through
a systematic approach varying task similarity and measuring and correlating
representation properties with transfer performance. We demonstrate the
generality of the methodology by investigating representations learned by a
Rainbow agent that successfully transfer across games modes in Atari 2600